heart disease
In the AI gold rush, tech firms are embracing 72-hour weeks
The recruitment website is jazzy, awash with pictures of happy young workers, and festooned with upbeat mini-slogans such as insane speed, infinite curiosity and customer obsession. Read a bit lower, and there are promises of perks galore: competitive compensation, free meals, free gym membership, free health and dental care and so on. But then comes the catch. Each job ad contains a warning: Please don't join if you're not excited about working ~70 hrs/week in person with some of the most ambitious people in NYC. The website belongs to Rilla, a New York-based tech business which sells AI-based systems that allow employers to monitor sales representatives when they are out and about, interacting with clients. The company has become something of a poster child for a fast-paced workplace culture known as 996, also sometimes referred to as hustle culture or grindcore.
- North America > United States > New York (0.24)
- North America > Central America (0.14)
- Asia > Japan (0.14)
- (15 more...)
- Law (1.00)
- Information Technology (1.00)
- Banking & Finance (0.94)
- (4 more...)
How Ensemble Learning Balances Accuracy and Overfitting: A Bias-Variance Perspective on Tabular Data
Abstract--Tree-based ensemble methods consistently outperform single models on tabular classification tasks, yet the conditions under which ensembles provide clear advantages--and prevent overfitting despite using high-variance base learners--are not always well understood by practitioners. We study four real-world classification problems (Breast Cancer diagnosis, Heart Disease prediction, Pima Indians Diabetes, and Credit Card Fraud detection) comparing classical single models against nine ensemble methods using five-seed repeated stratified cross-validation with statistical significance testing. Our results reveal three distinct regimes: (i) On nearly linearly separable data (Breast Cancer), well-regularized linear models achieve 97% accuracy with <2% generalization gaps; ensembles match but do not substantially exceed this performance. We systematically quantify dataset complexity through linearity scores, feature correlation, class separability, and noise estimates, explaining why different data regimes favor different model families. Cross-validated train/test accuracy and generalization-gap plots provide simple visual diagnostics for practitioners to assess when ensemble complexity is warranted. Statistical testing confirms that ensemble gains are significant on nonlinear tasks (p < 0.01) but not on near-linear data (p > 0.15). The study provides actionable guidelines for ensemble model selection in high-stakes tabular applications, with full code and reproducible experiments publicly available. A model that almost perfectly fits its training data can still fail badly on new cases. This gap between training performance and real-world behaviour is the essence of overfitting, and it is particularly problematic in domains such as medical diagnosis and financial fraud detection, where mistakes are costly: missed tumours delay treatment, and undetected fraud translates directly into monetary loss.
- North America > United States > Wisconsin (0.04)
- Asia > India (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Health & Medicine > Therapeutic Area > Oncology (0.90)
- Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.60)
Impugan: Learning Conditional Generative Models for Robust Data Imputation
Mahmud, Zalish, Kotal, Anantaa, Piplai, Aritran
Incomplete data are common in real-world applications. Sensors fail, records are inconsistent, and datasets collected from different sources often differ in scale, sampling rate, and quality. These differences create missing values that make it difficult to combine data and build reliable models. Standard imputation methods such as regression models, expectation-maximization, and multiple imputation rely on strong assumptions about linearity and independence. These assumptions rarely hold for complex or heterogeneous data, which can lead to biased or over-smoothed estimates. We propose Impugan, a conditional Generative Adversarial Network (cGAN) for imputing missing values and integrating heterogeneous datasets. The model is trained on complete samples to learn how missing variables depend on observed ones. During inference, the generator reconstructs missing entries from available features, and the discriminator enforces realism by distinguishing true from imputed data. This adversarial process allows Impugan to capture nonlinear and multimodal relationships that conventional methods cannot represent. In experiments on benchmark datasets and a multi-source integration task, Impugan achieves up to 82\% lower Earth Mover's Distance (EMD) and 70\% lower mutual-information deviation (MI) compared to leading baselines. These results show that adversarially trained generative models provide a scalable and principled approach for imputing and merging incomplete, heterogeneous data. Our model is available at: github.com/zalishmahmud/impuganBigData2025
- North America > United States > Texas > Brazos County > College Station (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.51)
- Government > Regional Government (0.46)
Grounding AI Explanations in Experience: A Reflective Cognitive Architecture for Clinical Decision Support
Shao, Zijian, Shen, Haiyang, Liu, Mugeng, Fu, Gecheng, Guo, Yaoqi, Wang, Yanfeng, Ma, Yun
Effective disease prediction in modern healthcare demands the twin goals of high accuracy and transparent, clinically meaningful explanations. Existing machine learning and large language model (LLM) based approaches often struggle to balance these goals. Many models yield accurate but unclear statistical outputs, while others generate fluent but statistically unsupported narratives, often undermining both the validity of the explanation and the predictive accuracy itself. This shortcoming comes from a shallow interaction with the data, preventing the development of a deep, detailed understanding similar to a human expert's. We argue that high accuracy and high-quality explanations are not separate objectives but are mutually reinforcing outcomes of a model that develops a deep, direct understanding of the data. To achieve this, we propose the Reflective Cognitive Architecture (RCA), a novel framework that coordinates multiple LLMs to learn from direct experience. RCA features an iterative rule refinement mechanism that improves its logic from prediction errors and a distribution-aware rules check mechanism that bases its reasoning in the dataset's global statistics. By using predictive accuracy as a signal to drive deeper comprehension, RCA builds a strong internal model of the data. We evaluated RCA on one private and two public datasets against 22 baselines. The results demonstrate that RCA not only achieves state-of-the-art accuracy and robustness with a relative improvement of up to 40\% over the baseline but, more importantly, leverages this deep understanding to excel in generating explanations that are clear, logical, evidence-based, and balanced, highlighting its potential for creating genuinely trustworthy clinical decision support systems. The code is available at \https://github.com/ssssszj/RCA.
- North America > United States (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Thailand > Bangkok > Bangkok (0.04)
- Asia > China > Beijing > Beijing (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.92)
- Health & Medicine > Therapeutic Area > Oncology (1.00)
- Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
- Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (0.71)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Cognitive Science (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.97)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)
Whole-Genome Sequencing Will Change Pregnancy
At WIRED Health 2025, Orchid CEO Noor Siddiqui and the genomics pioneer George Church laid out their view of the future of genetic screening. The world of pregnancy is going to radically change, predicts Noor Siddiqui. "I think that the default way people are going to choose to have kids is via IVF and embryo screening," she said at the WIRED Health summit last week. "There's just a massive amount of risk that you can take off of the table." Siddiqui is the founder and CEO of Orchid, a biotech company that offers whole-genome screening of embryos for IVF.
- North America > United States > Texas (0.15)
- North America > United States > California (0.05)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.05)
- (2 more...)
XSRD-Net: EXplainable Stroke Relapse Detection
Gapp, Christian, Tappeiner, Elias, Welk, Martin, Fritscher, Karl, Mangesius, Stephanie, Eisenschink, Constantin, Deisl, Philipp, Knoflach, Michael, Grams, Astrid E., Gizewski, Elke R., Schubert, Rainer
Stroke is the second most frequent cause of death world wide with an annual mortality of around 5.5 million. Recurrence rates of stroke are between 5 and 25% in the first year. As mortality rates for relapses are extraordinarily high (40%) it is of utmost importance to reduce the recurrence rates. We address this issue by detecting patients at risk of stroke recurrence at an early stage in order to enable appropriate therapy planning. To this end we collected 3D intracranial CTA image data and recorded concomitant heart diseases, the age and the gender of stroke patients between 2010 and 2024. We trained single- and multimodal deep learning based neural networks for binary relapse detection (Task 1) and for relapse free survival (RFS) time prediction together with a subsequent classification (Task 2). The separation of relapse from non-relapse patients (Task 1) could be solved with tabular data (AUC on test dataset: 0.84). However, for the main task, the regression (Task 2), our multimodal XSRD-net processed the modalities vision:tabular with 0.68:0.32 according to modality contribution measures. The c-index with respect to relapses for the multimodal model reached 0.68, and the AUC is 0.71 for the test dataset. Final, deeper interpretability analysis results could highlight a link between both heart diseases (tabular) and carotid arteries (vision) for the detection of relapses and the prediction of the RFS time. This is a central outcome that we strive to strengthen with ongoing data collection and model retraining.
- Europe > Austria > Tyrol > Innsbruck (0.05)
- Oceania > New Zealand (0.04)
- Oceania > Australia (0.04)
- (2 more...)
Peptidomic-Based Prediction Model for Coronary Heart Disease Using a Multilayer Perceptron Neural Network
Coronary heart disease (CHD) is a leading cause of death worldwide and contributes significantly to annual healthcare expenditures. To develop a non-invasive diagnostic approach, we designed a model based on a multilayer perceptron (MLP) neural network, trained on 50 key urinary peptide biomarkers selected via genetic algorithms. Treatment and control groups, each comprising 345 individuals, were balanced using the Synthetic Minority Over-sampling Technique (SMOTE). The neural network was trained using a stratified validation strategy. Using a network with three hidden layers of 60 neurons each and an output layer of two neurons, the model achieved a precision, sensitivity, and specificity of 95.67 percent, with an F1-score of 0.9565. The area under the ROC curve (AUC) reached 0.9748 for both classes, while the Matthews correlation coefficient (MCC) and Cohen's kappa coefficient were 0.9134 and 0.9131, respectively, demonstrating its reliability in detecting CHD. These results indicate that the model provides a highly accurate and robust non-invasive diagnostic tool for coronary heart disease.
- Oceania > Australia (0.04)
- North America > United States > Michigan (0.04)
- North America > Mexico (0.04)
- (3 more...)
- Research Report > Experimental Study (0.69)
- Research Report > New Finding (0.46)
Summarize-Exemplify-Reflect: Data-driven Insight Distillation Empowers LLMs for Few-shot Tabular Classification
Yuan, Yifei, Li, Jiatong, Zhang, Weijia, Aliannejadi, Mohammad, Kanoulas, Evangelos, Hu, Renjun
Recent studies show the promise of large language models (LLMs) for few-shot tabular classification but highlight challenges due to the variability in structured data. To address this, we propose distilling data into actionable insights to enable robust and effective classification by LLMs. Drawing inspiration from human learning processes, we introduce InsightTab, an insight distillation framework guided by principles of divide-and-conquer, easy-first, and reflective learning. Our approach integrates rule summarization, strategic exemplification, and insight reflection through deep collaboration between LLMs and data modeling techniques. The obtained insights enable LLMs to better align their general knowledge and capabilities with the particular requirements of specific tabular tasks. We extensively evaluate InsightTab on nine datasets. The results demonstrate consistent improvement over state-of-the-art methods. Ablation studies further validate the principle-guided distillation process, while analyses emphasize InsightTab's effectiveness in leveraging labeled data and managing bias.
- North America > United States > California (0.04)
- Asia > China (0.04)
- Europe > Switzerland > Zürich > Zürich (0.04)
- (3 more...)
- Research Report > New Finding (1.00)
- Research Report > Promising Solution (0.66)
- Health & Medicine > Therapeutic Area > Endocrinology (1.00)
- Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
- Education > Educational Setting (0.93)
- Banking & Finance (0.68)
Enhancement of Quantum Semi-Supervised Learning via Improved Laplacian and Poisson Methods
Gholipour, Hamed, Bozorgnia, Farid, Mohammadigheymasi, Hamzeh, Hambarde, Kailash, Mancilla, Javier, Proenca, Hugo, Neves, Joao, Challenger, Moharram
This paper develops a hybrid quantum approach for graph-based semi-supervised learning to enhance performance in scenarios where labeled data is scarce. We introduce two enhanced quantum models, the Improved Laplacian Quantum Semi-Supervised Learning (ILQSSL) and the Improved Poisson Quantum Semi-Supervised Learning (IPQSSL), that incorporate advanced label propagation strategies within variational quantum circuits. These models utilize QR decomposition to embed graph structure directly into quantum states, thereby enabling more effective learning in low-label settings. We validate our methods across four benchmark datasets like Iris, Wine, Heart Disease, and German Credit Card -- and show that both ILQSSL and IPQSSL consistently outperform leading classical semi-supervised learning algorithms, particularly under limited supervision. Beyond standard performance metrics, we examine the effect of circuit depth and qubit count on learning quality by analyzing entanglement entropy and Randomized Benchmarking (RB). Our results suggest that while some level of entanglement improves the model's ability to generalize, increased circuit complexity may introduce noise that undermines performance on current quantum hardware. Overall, the study highlights the potential of quantum-enhanced models for semi-supervised learning, offering practical insights into how quantum circuits can be designed to balance expressivity and stability. These findings support the role of quantum machine learning in advancing data-efficient classification, especially in applications constrained by label availability and hardware limitations.
- Europe > Portugal (0.04)
- Europe > Belgium > Flanders > Antwerp Province > Antwerp (0.04)
- Africa > Mozambique > Sofala Province > Beira (0.04)
- (7 more...)
- Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)